PAAA: A Progressive Iterative Alignment Algorithm Based on Anchors
نویسندگان
چکیده
1-INTRODUCTION Multiple Sequence Alignment (MSA) is an important method to compare biological sequences. It consists in optimising the number of matches between the residues occurring in the same order in each sequence. MSA is an NP-complete problem [Wang and Jiang 94]. There are several approaches to solve this problem. Progressive approach is the most used and the most effective one, it operates in three steps: (i) Pairwise comparison: the pairwise comparison defines a rate of similarity between the sequences. This rate is formulated by a distance that can be calculated directly, i.e., without constructing pairwise alignment or after the pairwise alignment. The goal of this step is to estimate the similarity between pairs of sequences in order to distinguish the closest sequences that are the first to be aligned. The distances computed between all pairs of sequences are stored in a symmetric diagonal matrix, called distance matrix. Among used distances we mention percent identity, percent of similarity [Thompson et al. 99], Kimura distance [Kimura 83], k-mer distances [Edgar 04], FDOD [Min et al. 05] and normalized scores [Wheeler and Kececioglu 07]. (ii) Sequence clustering: the classification of the sequences allow defining the branching order of the sequences that we follow by aligning profiles or sequences. Clustering method uses the distance matrix computed in the previous step. The most used approach is the construction of the guide tree such as UPGMA [Sneath and Sokal 73] and Neighbor-Joining [Saitou and Nei 87]. (iii) Integration of the sequences: this step allows aligning sequences using the branching order. We mention aligning alignment approach [Wheeler and Kececioglu 07] or profile-profile alignment [Gotoh 94] which is the most used method; there are several methods used; every method uses different sores for profiles-profiles alignment. Among progressive alignment algorithms we mention CLUSTALW [Thompson et al. 99], T-COFFEE [Notredame et al 00], GRAMALIGN [Russell et al. 08] and KALIGN [Lassmann and Sonnhammer 05], Multiple global alignment algorithms which adopt a progressive approach are fast, simple to implement and require a small memory space. However, they present two major drawbacks: (i) The first one is that the restriction of the comparison to two sequences at a time, rather than the comparison of all the sequences simultaneously, does not allow to take into consideration the common characters to a set of sequences. (ii) The second one is that the constructed alignment depends on the order in which the sequences are aligned …
منابع مشابه
A multi-hop PSO based localization algorithm for wireless sensor networks
A sensor network consists of a large number of sensor nodes that are distributed in a large geographic environment to collect data. Localization is one of the key issues in wireless sensor network researches because it is important to determine the location of an event. On the other side, finding the location of a wireless sensor node by the Global Positioning System (GPS) is not appropriate du...
متن کاملOn improving APIT algorithm for better localization in WSN
In Wireless Sensor Networks (WSNs), localization algorithms could be range-based or range-free. The Approximate Point in Triangle (APIT) is a range-free approach. We propose modification of the APIT algorithm and refer as modified-APIT. We select suitable triangles with appropriate distance between anchors to reduce PIT test errors (edge effect and non-uniform placement of neighbours) in APIT a...
متن کاملMultiple Sequence Alignment Tools: Assessing Performance of the Underlying Algorithms
Multiple sequence alignments have primary role in several domains of modern molecular biology such as protein 3D structure/function prediction, phylogeny inference, molecular function, intermolecular interactions and many other common tasks in sequence analysis. Presently, many tools to construct multiple sequence alignments are available but none of them is accurate for all types of data sets....
متن کاملA Sort-based Algorithm for Multiple Sequence Alignment *
We propose a sort-based algorithm for multiple sequence alignment using anchors. Anchors are determined by the use of suffix sorting along with position-based sorts. Potential anchor points are identified by a careful exploitation of the sorted suffixes obtained from a generalized suffix array of the input sequences. Final alignment is obtained by a recursive application of the suffix-sorting a...
متن کاملThe PRALINE online server: optimising progressive multiple alignment on the web
We introduce the online server for PRALINE (http://ibium.cs.vu.nl/programs/pralinewww/), an iterative versatile progressive multiple sequence alignment (MSA) tool. PRALINE provides various MSA optimisation strategies including weighted global and local profile pre-processing, secondary structure-guided alignment and a reliability measure for aligned individual residue positions. The latter can ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011